Search CORE

269 research outputs found

Toward a Principle-Based Translator

Author: Dorr Bonnie J.
Publication venue: MIT Artificial Intelligence Laboratory
Publication date: 01/06/1985
Field of study

A principle-based computational model of natural language translation consists of two components: (1) a module which makes use of a set of principles and parameters to transform the source language into an annotated surface form that can be easily converted into a "base" syntactic structure; and (2) a module which makes use of the same set of principles, but a different set of parameter values, to transform the "base" syntactic structure into the target language surface structure. This proposed scheme of language translation is an improvement over existing schemes since it is based on interactions between principles and parameters rather than on complex interactions between language-specific rules as found in older schemes. The background for research of the problem includes: an examination of existing schemes of computerized language translation and an analysis of their shortcomings. Construction of the proposed scheme requires a preliminary investigation of the common "universal" principles and parametric variations across different languages within the framework of current linguistic theory. The work to be done includes: construction of a module which uses linguistic principles and source language parameter values to parse and output the corresponding annotated surface structures of source language sentences; creation of procedures which handle the transformation of an annotated surface structure into a "base" syntactic structure; and development of a special purpose generation scheme which converts a "base" syntactic structure into a surface form in the target language.MIT Artificial Intelligence Laborator

DSpace@MIT

LEXICALL: Lexicon Construction for Foreign Language Tutoring

Author: Dorr Bonnie J.
Publication venue
Publication date: 15/10/1998
Field of study

We focus on the problem of building large repositories of lexical conceptual structure (LCS) representations for verbs in multiple languages. One of the main results of this work is the definition of a relation between broad semantic classes and LCS meaning components. Our acquisition program---LEXICALL---takes, as input, the result of previous work on verb classification and thematic grid tagging, and outputs LCS representations for different languages. These representations have been ported into English, Arabic and Spanish lexicons, each containing approximately 9000 verbs. We are currently using these lexicons in an operational foreign language tutoring and machine translation. (Also cross-referenced as UMIACS-TR-97-09

Digital Repository at the University of Maryland

Development of Cross-Linguistic Syntactic and Semantic Parameters for Parsing and Generation

Author: Dorr Bonnie J.
Publication venue
Publication date: 15/10/1998
Field of study

This document reports on research conducted at the University of Maryland for the Korean/English Machine Translation (MT) project. The translation approach adopted here is interlingual i.e., a single underlying representation called Lexical Conceptual Structure (LCS) is used for both Korean and English. The primary focus of this investigation concerns the notion of `parameterization' i.e., a mechanism that accounts for both syntactic and lexical-semantic distinctions between Korean and English. We present our assumptions about the syntactic structure of Korean-type languages vs. English-type languages and describe our investigation of syntactic parameterization for distinguishing between these two types of languages. We also present the details of the LCS structure and describe how this representation is parameterized so that it accommodates both languages. We address critical issues concerning interlingual machine translation such as locative postpositions and the dividing line between the interlingua and the knowledge representation. Difficulties in translation and transliteration of Korean are discussed and complex morphological properties of Korean are presented. Finally, we describe recent work on lexical acquisition and conclude with a discussion about two hypotheses concerning semantic classification that are currently being tested. (Also cross-referenced as UMIACS-TR-94-26

Digital Repository at the University of Maryland

UNITRAN: A Principle-Based Approach to Machine Translation

Author: Dorr Bonnie Jean
Publication venue
Publication date: 01/12/1987
Field of study

Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules

DSpace@MIT

Knowledge Graphs Effectiveness in Neural Machine Translation Improvement

Author: Ahmadnia Benyamin
Dorr Bonnie J.
Kordjamshidi Parisa
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2020
Field of study

Neural Machine Translation (NMT) systems require a massive amount of Maintaining semantic relations between words during the translation process yields more accurate target-language output from Neural Machine Translation (NMT). Although difficult to achieve from training data alone, it is possible to leverage Knowledge Graphs (KGs) to retain source-language semantic relations in the corresponding target-language translation. The core idea is to use KG entity relations as embedding constraints to improve the mapping from source to target. This paper describes two embedding constraints, both of which employ Entity Linking (EL)---assigning a unique identity to entities---to associate words in training sentences with those in the KG: (1) a monolingual embedding constraint that supports an enhanced semantic representation of the source words through access to relations between entities in a KG; and (2) a bilingual embedding constraint that forces entity relations in the source-language to be carried over to the corresponding entities in the target-language translation. The method is evaluated for English-Spanish translation exploiting Freebase as a source of knowledge. Our experimental results show that exploiting KG information not only decreases the number of unknown words in the translation but also improves translation quality

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Handling Translation Divergences in Generation-Heavy Hybrid Machine Translation

Author: Dorr Bonnie
Habash Nizar
Publication venue
Publication date: 04/04/2002
Field of study

This paper describes a novel approach for handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The approach depends on the existence of rich target language resources such as word lexical semantics, including information about categorial variations and subcategorization frames. These resources are used to generate multiple structural variations from a target-glossed lexico-syntactic representation of the source language sentence. The multiple structural variations account for different translation divergences. The overgeneration of the approach is constrained by a target-language model using corpus-based statistics. The exploitation of target language resources (symbolic and statistical) to handle a problem usually reserved to Transfer and Interlingual MT is useful for translation from structurally divergent source languages with scarce linguistic resources. A preliminary evaluation on the application of this approach to Spanish-English MT proves this approach extremely promising. The approach however is not limited to MT as it can be extended to monolingual NLG applications such as summarization. Also UMIACS-TR-2002-23 Also LAMP-TR-08

Digital Repository at the University of Maryland

Efficient Parsing for Korean and English: A Parameterized Message Passing Approach

Author: Bonnie Dorr
Dekang Lin
Jye-hoon Lee
Sungki Suh
Publication venue
Publication date: 01/01/1995
Field of study

This article presents an efficient, implemented approach to cross-linguistic parsing based on Government-Binding (GB) Theory (Chomsky, 1986) and followers. One of the drawbacks to alternative GB-based parsing approaches is that they generally adopt a filterbased paradigm. These approaches typically generate all possible candidate structures of the sentence that satisfy X theory, and then subsequently apply filters in order to eliminate those structures that violate GB principles. (See, for example, (Abney, 1989; Correa, 1991; Dorr, 1993; Fong, 1991).) The current approach provides an alternative to filter-based designs which avoids these difficulties by applying principles to description

CiteSeerX

LonXplain: Lonesomeness as a Consequence of Mental Disturbance in Reddit Posts

Author: Dorr Bonnie J.
Garg Muskan
Samanta Debabrata
Saxena Chandni
Publication venue
Publication date: 30/05/2023
Field of study

Social media is a potential source of information that infers latent mental states through Natural Language Processing (NLP). While narrating real-life experiences, social media users convey their feeling of loneliness or isolated lifestyle, impacting their mental well-being. Existing literature on psychological theories points to loneliness as the major consequence of interpersonal risk factors, propounding the need to investigate loneliness as a major aspect of mental disturbance. We formulate lonesomeness detection in social media posts as an explainable binary classification problem, discovering the users at-risk, suggesting the need of resilience for early control. To the best of our knowledge, there is no existing explainable dataset, i.e., one with human-readable, annotated text spans, to facilitate further research and development in loneliness detection causing mental disturbance. In this work, three experts: a senior clinical psychologist, a rehabilitation counselor, and a social NLP researcher define annotation schemes and perplexity guidelines to mark the presence or absence of lonesomeness, along with the marking of text-spans in original posts as explanation, in 3,521 Reddit posts. We expect the public release of our dataset, LonXplain, and traditional classifiers as baselines via GitHub

arXiv.org e-Print Archive

On automatic filtering of multilingual texts

Author: Bonnie J. Dorr
Christos Faloutsos
Douglas W. Oard
Nicholas Declaris
Publication venue
Publication date: 01/01/1994
Field of study

An emerging requirement to sift through the increasing ood of text information has led to the rapid development of information ltering technology in the past ve years. This study introduces novel approaches for ltering texts regardless of their source language. We begin with a brief description of related developments in text ltering and multilingual information retrieval. We then present three alternative approaches to selecting texts from a multilingual information stream which represent a logical evolution from existing techniques in related disciplines. Finally, a practical automated performance evaluation technique is proposed.

CiteSeerX